Scalable S-to-P Broadcasting on Message-Passing MPPs
نویسندگان
چکیده
In s-to-p broadcasting, s processors in a p-processor machine contain a message to be broadcast to all the processors, 1 ≤ s ≤ p. We present a number of different broadcasting algorithms that handle all ranges of s. We show how the performance of each algorithm is influenced by the distribution of the s source processors and by the relationships between the distribution and the characteristics of the interconnection network. For the Intel Paragon we show that for each algorithm and machine dimension there exist ideal distributions and distributions on which the performance degrades. For the Cray T3D we also demonstrate dependencies between distributions and machine sizes. To reduce the dependence of the performance on the distribution of sources, we propose a repositioning approach. In this approach, the initial distribution is turned into an ideal distribution of the target broadcasting algorithm. We report experimental results for the Intel Paragon and Cray T3D and discuss scalability and performance.
منابع مشابه
An Extension to MPI for Distributed Computing on MPPs
We present a tool that allows to run an MPI application on several MPPs without having to change the application code. PACX (PArallel Computer eXtension) provides to the user a distributed MPI environment with most of the important functionality of standard MPI. It is therefore well suited for usage in metacomputing. We are going to show how two MPPs are conngured by PACX into a single virtual ...
متن کاملMPPs versus Clusters
In coming years, if not already, the parallel-processing community can expect to hear regularly from MPP advocates and cluster advocates about why their approach is better. Either pitch is apt to be a hard sell: hard to sell to an informed audience or reader, and dull. The attempt to distinguish between MPPs and clusters is in some cases an empty subject. By the term “cluster,” I mean a group o...
متن کاملLow-Latency Message Passing on Workstation Clusters using SCRAMNet
Clusters of workstations have emerged as a popular platform for parallel and distributed computing. Commodity high speed networks which are used to connect workstation clusters provide high bandwidth, but also have high latency. SCRAMNet is an extremely low latency replicated non-coherent shared memory network, so far used only for real-time applications. This paper reports our early experience...
متن کاملPortable and Scalable Algorithms for Irregular All-to-All Communication
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all communication in High Performance Computing (HPC) systems. To minimize the communication latency, the algorithm reduces the total number of messages transmitted, reduces the variance of the lengths of these messages, and overlaps the communication with computation. The performance of the algorithm is ...
متن کاملMultidestination Message Passing in Wormhole K-ary N-cube Networks with Base Routing Conformed Paths 1
This paper proposes a novel concept of multidestination message passing mechanism for wormhole k-ary n-cube networks. Similar to the familiar car-pool concept, this mechanism allows data to be delivered to or picked-up from multiple nodes with a single message-passing step. Such messages can propagate along any valid path in a wormhole network conforming to the underlying base routing scheme (d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Parallel Distrib. Syst.
دوره 9 شماره
صفحات -
تاریخ انتشار 1996